[1] 0.95007
A discussion
November 2, 2023
Pre-requisites
Analysis
Evaluation
Reflection
Definition
An allocative (i) change (ii) increases efficiency if the gainers from the change are (iii) capable of compensating the losers and still coming out ahead.
Each individual’s gain or loss is defined as the value of a hypothetical monetary compensation that would keep each individual (in his or her own judgement) indifferent to the change
Cost-benefit analysis examines whether policy changes satisfy the compensation principle or not
Definition
A key concept is the Switching Value: The value at which a key input variable would need to change to switch from a recommended option to another or for a proposal not to receive funding.
Identifying switching values is crucial to decision-making.
| Variable | Value |
|---|---|
| Site area | 39 acre |
| Existing use land value estimate | £30,659 per acre |
| Future use land value estimate | £200,000 per acre |
| Land value uplift per acre | £169,341 per acre |
| Total land value uplift | £6.6m |
| Wider social benefits | £1.4m |
| Present Value Benefits (PVB) | £8m |
| Present Value Cost (PVC) | £10m |
| Benefit Cost Ratio (BCR = PVB / PVC) | 0.8 |
| Net Present Social Value (NPSV) | -£2m |
Definition
Optimism bias is the demonstrated systematic tendency for appraisers to be over-optimistic about key project parameters, including capital costs, operating costs, project duration and benefits delivery.
Adjust for optimism bias to provide a realistic assessment of project estimates.
Adjustments should align with risk avoidance and mitigation measures, with robust evidence required before reductions.
Apply optimism bias adjustments to operating and capital costs. Use confidence intervals for key input variables when typical bias measurements are unavailable.
Note
Monte Carlo analysis is a simulation-based risk modelling technique that produces expected values and confidence intervals. The outputs are the result of many simulations that model the collective impact of a number of uncertainties.
It is useful when there are a number of variables with significant uncertainties, which have known, or reasonably estimated, independent probability distributions.
It requires a well estimated model of the likely impacts of an intervention and expert professional input from an operational researcher, statistician, econometrician, or other experienced practitioner.
| project_id | low | central | high |
|---|---|---|---|
| 1 | 64.37888 | 159.9989 | 223.8726 |
| 2 | 89.41526 | 133.2824 | 296.2359 |
| 3 | 70.44885 | 148.8613 | 260.1366 |
| 4 | 94.15087 | 195.4474 | 424.4830 |
| 5 | 97.02336 | 148.2902 | 471.6297 |
| 6 | 52.27782 | 189.0350 | 288.0247 |
| 7 | 76.40527 | 191.4438 | 236.4092 |
| 8 | 94.62095 | 160.8735 | 684.4839 |
| … | … | … | … |
Objective: Create functions to generate different cost distributions based on user-specified parameters.
Process:
sample() function is used to randomly sample cost values from the sequence, with replacement, using the assumed probability distribution.Project costs are modeled using a uniform distribution spanning low to high.
Project costs are modeled using a uniform distribution spanning low to high.
Project costs are modeled using a uniform distribution spanning low to high.
Project costs are modeled using a uniform distribution spanning low to high.
uniform_1 <- function(low, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Uniform Probability distribution function
distribution <- dunif(sequence, min = low, max = high)
# Sampling from possible costs using the assumed distribution function
sample(x = sequence, size = 10000, replace = T, prob = distribution)
}Project costs are modeled using a normal distribution with a mean defined as the midpoint between high and low, and a standard deviationthat is 1/4 of the distance between high and low.
This means that, if the data is truly normally distributed, then the low and high estimates represent the 95% confidence interval for an individual project’s cost.
This function looks like:
This function looks like:
This function looks like:
This function looks like:
This function looks like:
normal_2 <- function(low, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Mean equal to the midpoint between low and high
mean_x = (high-low)/2+low
# Standard Deviation equal to 1/4 of the distance between low and high
sd_x = (high-low)/4
# Normal Probability Distribution Function
distribution <- dnorm(sequence, mean = mean_x, sd = sd_x)
}This function looks like:
normal_2 <- function(low, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Mean equal to the midpoint between low and high
mean_x = (high-low)/2+low
# Standard Deviation equal to 1/4 of the distance between low and high
sd_x = (high-low)/4
# Normal Probability Distribution Function
distribution <- dnorm(sequence, mean = mean_x, sd = sd_x)
# Sampling from possible costs using the assumed distribution function
sample(x = sequence, size = 10000, replace = T, prob = distribution)
}As before, except the mean of the normal distribution is assumed to be the central value.
As before, except the mean of the normal distribution is assumed to be the central value.
As before, except the mean of the normal distribution is assumed to be the central value.
As before, except the mean of the normal distribution is assumed to be the central value.
As before, except the mean of the normal distribution is assumed to be the central value.
normal_3 <- function(low, central, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Mean equal to the central project cost estimate
mean_x = central
# Standard Deviation equal to 1/4 of the distance between low and high
sd_x = (high-low)/4
# Normal Probability Distribution Function
distribution <- dnorm(sequence, mean = mean_x, sd = sd_x)
}As before, except the mean of the normal distribution is assumed to be the central value.
normal_3 <- function(low, central, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Mean equal to the central project cost estimate
mean_x = central
# Standard Deviation equal to 1/4 of the distance between low and high
sd_x = (high-low)/4
# Normal Probability Distribution Function
distribution <- dnorm(sequence, mean = mean_x, sd = sd_x)
# Sampling from possible costs using the assumed distribution function
sample(x = sequence, size = 10000, replace = T, prob = distribution)
}Are costs and benefits really normally distributed?
By definition, they can only be positive.
But the upper limit could be infinite?
Are costs and benefits really normally distributed?
By definition, they can only be positive.
But the upper limit could be infinite?
A solution
The Log-Normal distribution allows for a right skew and long upper tail while using the same input parameters as a normal distribution.
In the context of cost estimation for a project, we can leverage the Cumulative Density Function (CDF) of the Log-Normal distribution to calculate the mu (μ) and sigma (σ) parameters required to achieve a distribution where approximately 95% of estimates fall between the low and high cost estimates.
To achieve this, we need to establish a relationship between our central project cost estimate and the relevant formula. However, this approach relies on an assumption about what the central estimate represents.
One potential statistic that relates our three project cost estimates to the distribution parameters is the mode.
Assuming that the central cost estimate represents the most likely outcome, it corresponds to the peak of the probability distribution, making it the mode.
The mode of the Log-Normal distribution is given by the formula:
\[mode = e^{\mu - \sigma^2} = central\]
Solving for mu (μ) gives us:
\[\mu = \log(mode) + \sigma^2 = \log(central) + \sigma^2\]
Therefore, we need to find the value of sigma (σ) that results in approximately 95% of our project cost estimates falling between the high cost and low cost estimates.
This can be calculated by finding the difference between the Log-Normal CDF evaluated at the high cost estimate and the low cost estimate.
For a practical illustration, we can utilize the data from the first project.
First defining an open function
# Next using optimize to search the interval from lower to upper for a
# minimum of the function f with respect to the first argument, sigma.
optimize(f, lower = 0, upper = 1)
# Selecting the minimum from the tibble, this is the optimal sigma
sigma_test <- optimize(f, lower = 0, upper = 1)$minimum
# Plugging this back into the formula for the mean
mu_test <- (log(data$central[1]) + sigma_test^2) # Next using optimize to search the interval from lower to upper for a
# minimum of the function f with respect to the first argument, sigma.
optimize(f, lower = 0, upper = 1)
# Selecting the minimum from the tibble, this is the optimal sigma
sigma_test <- optimize(f, lower = 0, upper = 1)$minimum
# Plugging this back into the formula for the mean
mu_test <- (log(data$central[1]) + sigma_test^2)
# Now using these to simulate a distribution
N <- 10000000
nums <- rlnorm(N, mu_test, sigma_test)# Next using optimize to search the interval from lower to upper for a
# minimum of the function f with respect to the first argument, sigma.
optimize(f, lower = 0, upper = 1)
# Selecting the minimum from the tibble, this is the optimal sigma
sigma_test <- optimize(f, lower = 0, upper = 1)$minimum
# Plugging this back into the formula for the mean
mu_test <- (log(data$central[1]) + sigma_test^2)
# Now using these to simulate a distribution
N <- 10000000
nums <- rlnorm(N, mu_test, sigma_test)
# Now testing how many values lie between Low and High
sum(data$low[1] < nums & nums < data$high[1]) / N# Next using optimize to search the interval from lower to upper for a
# minimum of the function f with respect to the first argument, sigma.
optimize(f, lower = 0, upper = 1)
# Selecting the minimum from the tibble, this is the optimal sigma
sigma_test <- optimize(f, lower = 0, upper = 1)$minimum
# Plugging this back into the formula for the mean
mu_test <- (log(data$central[1]) + sigma_test^2)
# Now using these to simulate a distribution
N <- 10000000
nums <- rlnorm(N, mu_test, sigma_test)
# Now testing how many values lie between Low and High
sum(data$low[1] < nums & nums < data$high[1]) / N[1] 0.95007